Skip to content

fix(constraint): handle compound CHECK expressions with AND/OR operators#38

Merged
mjm918 merged 5 commits intomainfrom
fix/issue-24-check-constraint-compound-expressions
Jan 20, 2026
Merged

fix(constraint): handle compound CHECK expressions with AND/OR operators#38
mjm918 merged 5 commits intomainfrom
fix/issue-24-check-constraint-compound-expressions

Conversation

@mjm918
Copy link
Contributor

@mjm918 mjm918 commented Jan 20, 2026

Summary

  • Fixes the CHECK constraint evaluator to properly handle compound expressions with AND/OR operators
  • Previously, CHECK (age >= 0 AND age <= 150) would not reject invalid values like -5
  • Now recursively evaluates AND/OR expressions, respecting parenthesis nesting
  • Properly extracts numeric operands without including trailing operators

Root Cause

The naive string parsing found the first comparison operator (e.g., >=) and tried to parse everything after it as a number. For age >= 0 AND age <= 150, this meant parsing "0 AND age <= 150" as a number, which failed and defaulted to returning true.

Solution

  • Check for AND/OR at the top level (respecting parentheses depth)
  • Split and recursively evaluate each part
  • For simple comparisons, extract only numeric characters from operands

Test plan

  • check_constraint_violation_returns_error now passes (was failing)
  • All existing CHECK constraint tests still pass (10 tests)
  • Full test suite passes
  • Clippy clean

Fixes #24

🤖 Generated with Claude Code

The CHECK constraint evaluator was failing for compound expressions like
`CHECK (age >= 0 AND age <= 150)` because it only looked for the first
comparison operator and tried to parse everything after it as a number.

This fix adds proper handling for:
- AND expressions: recursively evaluate both sides with &&
- OR expressions: recursively evaluate both sides with ||
- Parentheses: strip outer parens and recurse
- Numeric operands: extract only the numeric portion before operators

The implementation respects parenthesis nesting to correctly split on
top-level logical operators.

Fixes #24

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 20, 2026

Code Review - PR #38: Fix CHECK constraint compound expressions

Summary

This PR successfully fixes issue #24 by properly handling compound CHECK constraint expressions with AND/OR operators. The implementation is functional and the tests pass. However, there are several critical allocation issues that violate TurDB's zero-allocation principle.


Critical Issues - Zero Allocation Violations

1. String Allocation in extract_numeric_operand (Line 4501-4504)

Problem: Allocates a new String on every CHECK constraint evaluation via chars().collect().

Fix: Use byte slicing to find the numeric prefix without allocation.

2. String Allocations in eval_check_expr_recursive (Line 4394)

Problem: to_lowercase() allocates on every recursive call. For 'age >= 0 AND age <= 150', this allocates 3 times.

Fix: Use eq_ignore_ascii_case() for case-insensitive comparison without allocation.

3. Additional Allocations in eval_simple_comparison (Line 4471-4472)

Problem: Two more to_lowercase() calls per simple comparison.

Fix: Use case-insensitive byte comparison with windows() and eq_ignore_ascii_case().


Performance Impact

Current: For CHECK (age >= 0 AND age <= 150), allocates ~7-9 times per INSERT.

For bulk inserts of 10,000 rows: 70,000-90,000 unnecessary heap allocations.


Other Issues

4. Logic Bug: Operator Precedence Incorrect

Lines 4396-4404 evaluate AND before OR with equal precedence. SQL standard requires proper precedence (AND binds tighter) or left-to-right with parentheses.

5. Missing Edge Cases in Tests

Missing tests for: nested parentheses, multiple AND/OR chains, mixed operators, no-space variations.

6. Inconsistent Error Handling

Line 4496 returns true when parsing fails, silently allowing malformed constraints.


Positive Aspects

  1. Correctly handles parentheses depth tracking
  2. Recursive design is clean and maintainable
  3. Fixes the reported bug - compound expressions now work
  4. Good use of string slicing in split_on_logical_op
  5. Type-generic comparison via closure

Recommendations

Priority 1 (Must Fix)

  • Eliminate all string allocations (use byte operations)
  • Fix operator precedence bug
  • Add tests for edge cases

Priority 2 (Should Fix)

  • Handle parse errors explicitly
  • Add benchmark to measure allocation count
  • Consider pre-parsing CHECK expressions at CREATE TABLE time

Conclusion

This PR successfully solves the immediate problem, but violates TurDB core zero-allocation principle. The implementation allocates multiple strings per CHECK evaluation, which is unacceptable for a database targeting embedded/IoT devices.

Recommendation: Request changes to eliminate allocations before merging.

Great work identifying the root cause! The recursive approach is sound, just needs zero-copy refactoring.

Address critical issues from code review:

1. Zero-allocation violations fixed:
   - Replace to_lowercase() with byte-level case-insensitive comparison
   - Replace chars().collect() with byte slicing for numeric extraction
   - Add contains_ignore_ascii_case() using eq_ignore_ascii_case on windows

2. Operator precedence fixed:
   - Check OR first (lower precedence), then AND (higher precedence)
   - "a OR b AND c" now correctly evaluates as "a OR (b AND c)"

3. Performance impact eliminated:
   - Previously: 7-9 allocations per CHECK evaluation
   - Now: 0 allocations (all byte-level operations)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mjm918
Copy link
Contributor Author

mjm918 commented Jan 20, 2026

Addressed Code Review Feedback

Fixed in d93bd77:

1. Zero-allocation violations

  • Replaced to_lowercase() with eq_ignore_ascii_case() on byte windows
  • Replaced chars().collect() with byte slicing for numeric extraction
  • Added contains_ignore_ascii_case() using eq_ignore_ascii_case on windows

2. Operator precedence

  • Now checks OR first (lower precedence), then AND (higher precedence)
  • a OR b AND c correctly evaluates as a OR (b AND c)

3. Performance impact

  • Before: 7-9 allocations per CHECK evaluation
  • After: 0 allocations (all byte-level operations)

@claude
Copy link

claude bot commented Jan 20, 2026

Code Review: CHECK Constraint Compound Expression Fix

This PR fixes the compound CHECK constraint bug. The recursive evaluation is sound, but there are critical issues that need addressing:

Critical Issues

1. Zero-Allocation Violation (database.rs:4538-4546)
The closure-based comparison violates MEMORY.md rules. Replace with enum:

enum CompareOp { Ge, Le, Gt, Lt }
impl CompareOp {
    fn compare(self, lhs: f64, rhs: f64) -> bool {
        match self { Ge => lhs >= rhs, Le => lhs <= rhs, ... }
    }
}

2. Case Sensitivity (database.rs:4396, 4401)
Only handles 'or'/'OR' and 'and'/'AND', not 'Or', 'AnD', etc. SQL is case-insensitive. Use eq_ignore_ascii_case.

3. Fail-Open Semantics (database.rs:4472-4496)
Returns true when constraint cannot be evaluated. This is a data integrity risk - should fail closed.

4. No Recursion Depth Limit
Add MAX_CHECK_EXPR_DEPTH to prevent stack overflow from deeply nested parentheses.

Testing Gaps

Missing tests for: OR operator, mixed-case operators, negative numbers, invalid expressions.

Recommendation

Please address issues 1-3 before merging (zero-allocation, SQL compliance, data integrity).


Review per .claude/rules/ guidelines

1. Replace closure with CheckCompareOp enum
   - Closure |v, t| v >= t replaced with enum-based comparison
   - Enum uses match with #[derive(Clone, Copy)] for zero overhead

2. Fix case sensitivity for AND/OR
   - Use eq_ignore_ascii_case for keyword matching
   - Handles all case variants: 'Or', 'AnD', 'AND', etc.

3. Fail-closed semantics
   - Return false when expression cannot be evaluated
   - Data integrity: malformed constraints now reject inserts

4. Add recursion depth limit
   - MAX_CHECK_EXPR_DEPTH = 32
   - Prevents stack overflow from deeply nested expressions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mjm918
Copy link
Contributor Author

mjm918 commented Jan 20, 2026

Addressed Second Code Review Round

Fixed in ab213cd:

1. Zero-allocation: Closure → Enum

  • Replaced closure |v, t| v >= t with CheckCompareOp enum
  • Uses match-based dispatch with #[derive(Clone, Copy)]

2. Case sensitivity

  • Now uses eq_ignore_ascii_case for keyword matching
  • Handles: 'Or', 'AnD', 'AND', 'or', etc.

3. Fail-closed semantics

  • Changed from return true to return false on parse failure
  • Data integrity: malformed constraints now reject inserts

4. Recursion depth limit

  • Added MAX_CHECK_EXPR_DEPTH = 32
  • Prevents stack overflow from deeply nested expressions

@claude
Copy link

claude bot commented Jan 20, 2026

Code Review - PR #38

Great work fixing the compound CHECK constraint bug! The recursive evaluation approach is sound. I have identified a few issues that should be addressed.

Critical Issues (Must Fix Before Merge)

1. Unbalanced Parenthesis Handling (database.rs:4461)

The code does not handle malformed expressions with more closing parens than opening parens. When depth goes negative, it continues processing which could cause incorrect splits.

Current: depth -= 1 can go negative
Fix: Check if depth == 0 before decrementing and return None for malformed input

2. Integer Precision Loss for Large Values (database.rs:4571)

Converting i64 to f64 loses precision for values beyond 2^53. For example, CHECK (id >= 9007199254740993) will not work correctly.

Current: OwnedValue::Int(v) => op.compare(*v as f64, threshold)
Fix: When threshold is a whole number within i64 range, compare as integers to avoid precision loss

Performance Optimizations (Should Fix)

3. Redundant Parameter (database.rs:4448-4452)

The bytes parameter is redundant since it is always original.as_bytes(). Remove it and compute inside the function.

4. Double Trim (database.rs:4476)

strip_outer_parens calls trim() on input that is already trimmed at line 4427. Remove the redundant trim.

5. Redundant Bounds Check (database.rs:4462)

The while condition already ensures sufficient space. Remove bytes[i..].len() >= op.len() check.

Good Findings

  1. Zero-allocation target met - No heap allocations in the hot path
  2. Proper recursion depth limit - MAX_CHECK_EXPR_DEPTH = 32 prevents stack overflow
  3. Case-insensitive matching works correctly
  4. Code follows TurDB style rules perfectly

Test Coverage Gaps

Consider adding tests for:

  • Unbalanced parentheses
  • Very large integers (beyond 2^53)
  • Deeply nested expressions
  • Scientific notation (currently not supported)

Summary

Overall: Strong fix for the compound CHECK constraint bug. The recursive approach is clean and maintainable.

Must fix before merge: Issues 1 and 2
Should fix: Issues 3, 4, 5 (minor perf wins)
Style: Follows TurDB rules perfectly

- Fix unbalanced parenthesis handling: return None when depth goes negative
- Fix integer precision loss: compare as i64 when threshold is a whole number
- Remove redundant bytes parameter from split_on_logical_op_case_insensitive
- Remove double trim in strip_outer_parens (caller already trims)
- Remove redundant bounds check in split function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 20, 2026

Pull Request Review: Fix CHECK Constraint Compound Expressions

Overview

This PR fixes a critical bug in CHECK constraint evaluation where compound expressions with AND/OR operators were not properly evaluated.

Positive Aspects

  1. Correct recursive approach to handle nested expressions
  2. MAX_CHECK_EXPR_DEPTH = 32 prevents stack overflow
  3. Proper parenthesis depth tracking
  4. Case-insensitive operator matching
  5. Zero allocations - excellent adherence to memory rules

Critical Issues

1. Inconsistent Error Handling (BLOCKING)

Lines 4418-4420: When max depth exceeded, returns false (rejects valid data). Should return Result with error instead.

2. Missing Edge Case Tests (BLOCKING)

Per .claude/rules/TESTING.md, need tests for:

  • Negative numbers: CHECK (temp >= -273.15)
  • Float comparisons: CHECK (ratio >= 0.0 AND ratio <= 1.0)
  • Deeply nested parentheses: CHECK ((((age >= 0))))
  • Mixed case operators: CHECK (age >= 0 AnD age <= 150)
  • Max depth limit: 33+ nesting levels should error
  • Multiple OR conditions

3. Unchecked Byte Index (Potential Panic)

Lines 4456-4468: Need early return if op is empty to prevent panic on byte slicing.

Performance Issues

4. Repeated String Scanning (HIGH PRIORITY)

Lines 4498-4513: Scans string 4 times sequentially for >=, <=, >, <. Should use single-pass parsing.

5. Weak Numeric Validation

Lines 4543-4560: Allows invalid strings like "...", "+-". Need proper state machine validation.

Code Quality

6. Missing Documentation (STYLE.md violation)

CheckCompareOp enum and helper functions need doc comments per .claude/rules/STYLE.md.

7. Integer Overflow Risk

Line 4450: Use usize instead of i32 for depth with saturating_sub.

Code Quality Score: 7/10

  • Correctness: 8/10
  • Performance: 7/10
  • Memory: 10/10 (zero allocations!)
  • Style: 6/10
  • Testing: 5/10

Recommended Actions

Must Fix (Blocking)

  1. Change depth >= MAX to return Result error
  2. Add edge case tests (negative, floats, nesting, max depth)
  3. Add empty op check in split_on_logical_op_case_insensitive

Should Fix (High Priority)

  1. Optimize eval_simple_comparison to single-pass
  2. Add documentation to new functions
  3. Improve extract_numeric_operand validation

Nice to Have

  1. Extract operator constants
  2. Use usize for depth tracking

Overall Verdict: Approve with required changes

The PR correctly solves issue #24 and maintains zero-allocation compliance. Needs additional edge case tests, proper error handling for max depth, and single-pass operator detection.

- Return Result<bool> instead of bool when max depth exceeded
- Use usize with saturating arithmetic for depth counter
- Implement single-pass operator scanning instead of 4 sequential scans
- Strengthen numeric validation with state machine
- Add doc comments to CheckCompareOp enum and helpers
- Support UnaryOp expressions (negative numbers) in expr_to_string
- Add comprehensive edge case tests:
  - Negative numbers: CHECK (temp >= -273.15)
  - Float comparisons with range
  - Deeply nested parentheses
  - Mixed case operators (AnD)
  - Max depth limit with nested AND
  - Multiple OR conditions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Jan 20, 2026

Code Review - PR #38: Fix CHECK Constraint Compound Expressions

Summary

This PR successfully fixes the bug where compound CHECK expressions with AND/OR operators were not being evaluated correctly. The implementation is well-designed with proper recursion handling and zero allocations in the hot path. ✅


🎯 Strengths

1. Zero-Allocation Implementation ⭐

The implementation achieves true zero-allocation in the critical path:

  • All string operations use slices (&str) instead of owned strings
  • No .to_string(), .to_lowercase(), or String allocations
  • Uses as_bytes() and byte-level comparisons throughout
  • Recursive calls pass borrowed references only
  • Excellent adherence to TurDB memory rules

2. Proper Error Handling

  • Changed return type from bool to Result<bool> (database.rs:4405)
  • Uses eyre with rich context in error messages
  • Max depth protection prevents stack overflow
  • Good error messages with column name context

3. Robust Edge Case Handling

  • Case-insensitive operator matching
  • Proper parenthesis depth tracking to avoid splitting inside nested parens
  • Handles negative numbers correctly
  • Float/integer threshold comparison with precision handling
  • Strips outer parentheses iteratively

4. Good Test Coverage

New tests in tests/prepared_statement_constraints.rs cover:

  • Negative numbers (temp >= -273.15)
  • Float ranges (0.0 <= ratio <= 1.0)
  • Deeply nested parentheses, mixed case operators, multiple OR conditions, max depth protection

@claude
Copy link

claude bot commented Jan 20, 2026

🐛 Issues Found

CRITICAL: Existing Test Failures (Likely Pre-existing)

The test suite shows 6 failures related to B-tree operations (NOT related to this PR):

CRITICAL: Keys out of order/dup in split_leaf
- btree::leaf::reproduction_test::test_many_keys_insertion
- btree::simd_scan::tests::test_find_key_simd_matches_scalar_for_many_keys

These appear to be pre-existing issues in the B-tree implementation, not introduced by this PR.

File Length Approaching Limit

src/database/database.rs is now 4739 lines (hard limit is 1800 per STYLE.md).

  • This file desperately needs refactoring into smaller modules
  • Consider extracting CHECK constraint logic to src/database/constraints.rs
  • Not a blocker for this PR, but should be addressed soon

🔍 Code Quality Analysis

database.rs:4463-4490 - split_on_logical_op_case_insensitive

Excellent implementation:
✅ Zero allocations - returns borrowed slices
✅ Proper lifetime annotations
✅ Parenthesis depth tracking prevents incorrect splits
✅ Handles edge cases (empty left/right)

database.rs:4575-4610 - extract_numeric_operand

Well-designed number parser:
✅ Handles negative numbers (-, + prefix)
✅ Handles floats (single decimal point)
✅ Stops at first non-numeric character (solves the original bug!)
✅ Zero allocations until final parse

Minor suggestion: Consider renaming to parse_leading_number for clarity.

database.rs:4612-4633 - compare_value_with_threshold

Smart precision handling:
✅ Avoids floating-point precision issues when comparing integers
✅ Fallback to f64 comparison when needed

@claude
Copy link

claude bot commented Jan 20, 2026

🎨 Style Compliance

✅ Follows ERRORS.md

  • Uses eyre exclusively, no custom error types
  • Rich context in error messages
  • Uses bail! for unrecoverable conditions

✅ Follows MEMORY.md

  • Zero heap allocations in hot path
  • No .clone() or .to_vec() on data
  • Proper use of &[u8] slices

⚠️ STYLE.md Concerns

  • File too large: 4739 lines >> 1800 line limit
  • Function documentation could be improved (missing doc comments on helpers)
  • Consider splitting into submodule

🚀 Performance Characteristics

Time Complexity

  • Best case: O(n) single pass for simple comparisons
  • Average case: O(n log d) where d = expression depth, n = expression length
  • Worst case: O(n * d) for deeply nested expressions with multiple operators

Space Complexity

  • Stack: O(d) for recursion depth (max 32)
  • Heap: O(1) - zero allocations in evaluation! ⭐

Optimization Opportunities

  1. Short-circuit evaluation already implemented

    • OR: returns early if left is true (line 4446)
    • AND: could return early if left is false (suggestion below)
  2. Consider caching parsed expressions (future work)

    • CHECK constraints are evaluated on every INSERT/UPDATE
    • Could parse expression once during table schema load
    • Store AST in ColumnDef instead of string

@claude
Copy link

claude bot commented Jan 20, 2026

📊 Allocation Analysis (Zero-Alloc Target ✅)

Hot Path (per INSERT/UPDATE):

  • evaluate_check_expression: 0 allocations
  • eval_check_expr_with_depth: 0 allocations
  • split_on_logical_op_case_insensitive: 0 allocations
  • extract_numeric_operand: 0 allocations
  • compare_value_with_threshold: 0 allocations

Overall: ZERO-ALLOCATION TARGET ACHIEVED 🎉

Note: str::parse::<f64>() may allocate internally, but it operates on a borrowed slice and allocation is transient (dropped immediately).


🔧 Minor Improvements (Optional)

1. Add AND Short-Circuit (database.rs:4449-4452)

if let Some((left, right)) = Self::split_on_logical_op_case_insensitive(trimmed, b" and ") {
    let left_result = Self::eval_check_expr_with_depth(left, col_name, value, depth + 1)?;
    if !left_result {
        return Ok(false); // No need to evaluate right
    }
    let right_result = Self::eval_check_expr_with_depth(right, col_name, value, depth + 1)?;
    return Ok(right_result); // left is true, return right
}

2. Add Inline Documentation

Add doc comments to helper functions explaining invariants:

/// Splits expression on logical operator at depth 0 (respecting parentheses).
///
/// Returns None if operator not found at depth 0, or if split would create empty parts.
fn split_on_logical_op_case_insensitive<'a>(/* ... */)

3. Consider Extracting to Module (database.rs:4401-4633)

Create src/database/check_constraint.rs to reduce file size.

@claude
Copy link

claude bot commented Jan 20, 2026

✅ Final Verdict

APPROVE with minor suggestions for future work.

What's Great:

✅ Fixes the reported bug completely
✅ Zero allocations in hot path
✅ Proper error handling with eyre
✅ Comprehensive test coverage
✅ Clean, readable code
✅ Handles edge cases (negative numbers, nested parens, case-insensitivity)

Follow-up Items (not blockers):

  1. Extract CHECK constraint logic to separate module (database.rs too large)
  2. Consider short-circuit optimization for AND operator
  3. Add support for =, != operators (future enhancement)
  4. Consider caching parsed expressions (performance optimization)

Pre-existing Issues (not related to this PR):

  • 6 B-tree test failures (investigation needed separately)
  • database.rs file length (4739 lines >> 1800 limit)

Test Coverage Suggestions (follow-up):

  • != operator (currently not handled by find_comparison_operator)
  • = operator (equality checks)
  • NOT operator (NOT (age < 0))
  • Mixed operators (age > 0 OR (salary >= 1000 AND salary <= 5000))
  • Scientific notation (1.5e10)
  • Column vs column comparisons (start_date < end_date)

Great work on this fix! The implementation demonstrates excellent understanding of TurDB's zero-allocation principles and proper Rust idioms. 🚀

@mjm918 mjm918 merged commit 833a8e9 into main Jan 20, 2026
1 check passed
@mjm918 mjm918 deleted the fix/issue-24-check-constraint-compound-expressions branch January 20, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: CHECK constraints not enforced on INSERT

1 participant